59 research outputs found

    Report on the 2015 NSF Workshop on Unified Annotation Tooling

    Get PDF
    On March 30 & 31, 2015, an international group of twenty-three researchers with expertise in linguistic annotation convened in Sunny Isles Beach, Florida to discuss problems with and potential solutions for the state of linguistic annotation tooling. The participants comprised 14 researchers from the U.S. and 9 from outside the U.S., with 7 countries and 4 continents represented, and hailed from fields and specialties including computational linguistics, artificial intelligence, speech processing, multi-modal data processing, clinical & medical natural language processing, linguistics, documentary linguistics, sign-language linguistics, corpus linguistics, and the digital humanities. The motivating problem of the workshop was the balkanization of annotation tooling, namely, that even though linguistic annotation requires sophisticated tool support to efficiently generate high-quality data, the landscape of tools for the field is fractured, incompatible, inconsistent, and lacks key capabilities. The overall goal of the workshop was to chart the way forward, centering on five key questions: (1) What are the problems with current tool landscape? (2) What are the possible benefits of solving some or all of these problems? (3) What capabilities are most needed? (4) How should we go about implementing these capabilities? And, (5) How should we ensure longevity and sustainability of the solution? I surveyed the participants before their arrival, which provided significant raw material for ideas, and the workshop discussion itself resulted in identification of ten specific classes of problems, five sets of most-needed capabilities. Importantly, we identified annotation project managers in computational linguistics as the key recipients and users of any solution, thereby succinctly addressing questions about the scope and audience of potential solutions. We discussed management and sustainability of potential solutions at length. The participants agreed on sixteen recommendations for future work. This technical report contains a detailed discussion of all these topics, a point-by-point review of the discussion in the workshop as it unfolded, detailed information on the participants and their expertise, and the summarized data from the surveys

    Learning narrative structure from annotated folktales

    Get PDF
    Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.This electronic version was submitted by the student author. The certified thesis is available in the Institute Archives and Special Collections.Cataloged from student submitted PDF version of thesis.Includes bibliographical references (p. 97-100).Narrative structure is an ubiquitous and intriguing phenomenon. By virtue of structure we recognize the presence of Villainy or Revenge in a story, even if that word is not actually present in the text. Narrative structure is an anvil for forging new artificial intelligence and machine learning techniques, and is a window into abstraction and conceptual learning as well as into culture and its in influence on cognition. I advance our understanding of narrative structure by describing Analogical Story Merging (ASM), a new machine learning algorithm that can extract culturally-relevant plot patterns from sets of folktales. I demonstrate that ASM can learn a substantive portion of Vladimir Propp's in influential theory of the structure of folktale plots. The challenge was to take descriptions at one semantic level, namely, an event timeline as described in folktales, and abstract to the next higher level: structures such as Villainy, Stuggle- Victory, and Reward. ASM is based on Bayesian Model Merging, a technique for learning regular grammars. I demonstrate that, despite ASM's large search space, a carefully-tuned prior allows the algorithm to converge, and furthermore it reproduces Propp's categories with a chance-adjusted Rand index of 0.511 to 0.714. Three important categories are identied with F-measures above 0.8. The data are 15 Russian folktales, comprising 18,862 words, a subset of Propp's original tales. This subset was annotated for 18 aspects of meaning by 12 annotators using the Story Workbench, a general text-annotation tool I developed for this work. Each aspect was doubly-annotated and adjudicated at inter-annotator F-measures that cluster around 0.7 to 0.8. It is the largest, most deeply-annotated narrative corpus assembled to date. The work has significance far beyond folktales. First, it points the way toward important applications in many domains, including information retrieval, persuasion and negotiation, natural language understanding and generation, and computational creativity. Second, abstraction from natural language semantics is a skill that underlies many cognitive tasks, and so this work provides insight into those processes. Finally, the work opens the door to a computational understanding of cultural in influences on cognition and understanding cultural differences as captured in stories.by Mark Alan Finlayson.Ph.D

    Annotation Guide for the UCM/MIT Indications, Referential Expressions, and Coreference Corpus (UMIREC Corpus)

    Get PDF
    This is the annotation guide given to the annotators who created the UCM/MIT Indications, Referring Expressions, and Coreference (UMIREC) Corpus version 1.0. The corpus comprises texts annotated for referring expressions, coreference relations between the referring expressions, and so-called "indication structures", which split referring expressions into constituents (nuclei and modifiers) and mark each constituent as either 'distinctive' or 'descriptive', which indicate whether or not the constituent contains information required for uniquely identifying the referent. The contents of this corpus, the annotation procedure, and the indication structures are described in more detail in a paper titled "The Prevalence of Descriptive Referring Expressions in News and Narrative" published in the proceedings of the 48th Annual Meeting of the Association for Computational Linguistics, held in July 2010 in Uppsala, Sweden (ACL-2010)

    Analogical Retrieval via Intermediate Features: The Goldilocks Hypothesis

    Get PDF
    Analogical reasoning has been implicated in many important cognitive processes, such as learning, categorization, planning, and understanding natural language. Therefore, to obtain a full understanding of these processes, we must come to a better understanding of how people reason by analogy. Analogical reasoning is thought to occur in at least three stages: retrieval of a source description from memory upon presentation of a target description, mapping of the source description to the target description, and transfer of relationships from source description to target description. Here we examine the first stage, the retrieval of relevant sources from long-term memory for their use in analogical reasoning. Specifically we ask: what can people retrieve from long-term memory, and how do they do it?Psychological experiments show that subjects display two sorts of retrieval patterns when reasoning by analogy: a novice pattern and an expert pattern. Novice-like subjects are more likely to recall superficiallysimilar descriptions that are not helpful for reasoning by analogy. Conversely, expert-like subjects are more likely to recall structurally-related descriptions that are useful for further analogical reasoning. Previous computational models of the retrieval stage have only attempted to model novice-like retrieval. We introduce a computational model that can demonstrate both novice-like and expert-like retrieval with the same mechanism. The parameter of the model that is varied to produce these two types of retrieval is the average size of the features used to identify matches in memory. We find that, in agreement with an intuition from the work of Ullman and co-workers regarding the use of features in visual classification (Ullman, Vidal-Naquet,& Sali, 2002), that features of an intermediate size are most useful for analogical retrieval.We conducted two computational experiments on our own dataset of fourteen formally described stories, which showed that our model gives the strongest analogical retrieval, and is most expert-like, when it uses features that are on average of intermediate size. We conducted a third computational experiment on the Karla the Hawk dataset which showed a modest effect consistent with our predictions. Because our model and Ullmans work both rely on intermediate-sized features to perform recognition-like tasks, we take both as supporting what we call the Goldilocks hypothesis: that on the average those features that are maximally useful for recognition are neither too small nor too large, neither too simple nor too complex, but rather are in the middle, of intermediate size and complexity

    Advancing Computational Models of Narrative

    Get PDF
    Report of a Workshop held at the Wylie Center, Beverly, MA, Oct 8-10 2009Sponsored by the AFOSR under MIT-MURI contract #FA9550-05-1-032

    Allopurinol versus usual care in UK patients with ischaemic heart disease (ALL-HEART) : a multicentre, prospective, randomised, open-label, blinded-endpoint trial

    Get PDF
    Funding Information: ISM reports research grants from Menarini, EMA, Sanofi, Health Data Research UK, the British Heart Foundation, and Innovative Medicines Initiative; institutional consultancy income from AstraZeneca outside the submitted work; and personal income from AstraZeneca and Amgen outside the submitted work. TMM reports grants from Menarini/Ipsen/Teijin and Merck Sharp & Dohme outside the submitted work, and personal income for consultancy from Novartis and AstraZeneca outside the submitted work, and is a trustee of the Scottish Heart Arterial Risk Prevention Society. AGB reports personal income from Novartis, Mylan, AstraZeneca, Bayer, Daiichi-Sankyo, Boehringer, Pfizer, Galderma, Zambon, and Novo-Nordisk outside the submitted work. ADS and the University of Dundee hold a European patent for the use of xanthine oxidase inhibitors in treating chest pain in angina pectoris. AW declares personal income for consultancy from AbbVie, Akcea, Albireo, Alexion, Allergan, Amarin, Apsara, Arena, Astellas, AstraZeneca, Autolus, Bayer, Biocryst, Biogen, Biomarin, Bristol Myers Squibb, Boehringer Ingelheim, Calico, Celgene, Chiesi, Daiichi Sankyo, Diurnal, Elsai, Eli Lilly, Ferring, Galapagos, Gedeon Richter, Gilead, GlaxoSmithKline, GW Pharma, Idorsia, Incyte, Intercept, Ionis, Ipsen, Janssen, Jazz, Jcyte, Kite Gilead, LEK, Leo Pharma, Les Laboratoires Servier, Lundbeck, Merck (Merck Sharp & Dohme), Merck-Serono, Mitenyi, Mundibiopharma, Mustang Bio, Mylan, Myovant, Norgine, Novartis, Novo Nordisk, Orchard, Paion, Pfizer, Pierre Fabre, PTC, RegenXBio, Rhythm, Sanofi, Santen, Sarepta, SeaGen, Shionogi, Sigmatec, SOBI, Takeda, Tanaya, UCB, and Vertex outside the submitted work. JST declares research funding from the UK National Institute for Health and Care Research (NIHR) and NHS England outside the submitted work and membership of a UK National Institute for Health and Care Excellence guideline committee on management of atrial fibrillation. All other authors declare no competing interests. Funding Information: This study was funded by the NIHR Health Technology Assessment programme (HTA 11/36/41 to ISM, IF, CJH, LW, ADS, AGB, AJA, AW, JST, and TMM). The views expressed are those of the authors and not necessarily those of the NIHR or the UK Department of Health and Social Care. The study was supported by the Scottish Primary Care Research Network, Support for Science Scotland (Grampian, Highlands, Tayside, Fife, Forth Valley, Greater Glasgow and Clyde, Lothian, Ayrshire and Arran, Dumfries and Galloway, and Lanarkshire), and the NIHR Local Clinical Research Networks (East Midlands, West Midlands, Eastern, North Thames, Yorkshire and Humber, North East and North Cumbria, North West Coast, Kent, Surrey and Sussex, and South West Peninsula), which assisted with recruitment and other study activities. We thank Public Health Scotland and NHS Digital for providing data linkage. We thank all the participants, physicians, nurses, and other staff who participated in the ALL-HEART study. Funding Information: This study was funded by the NIHR Health Technology Assessment programme (HTA 11/36/41 to ISM, IF, CJH, LW, ADS, AGB, AJA, AW, JST, and TMM). The views expressed are those of the authors and not necessarily those of the NIHR or the UK Department of Health and Social Care. The study was supported by the Scottish Primary Care Research Network, Support for Science Scotland (Grampian, Highlands, Tayside, Fife, Forth Valley, Greater Glasgow and Clyde, Lothian, Ayrshire and Arran, Dumfries and Galloway, and Lanarkshire), and the NIHR Local Clinical Research Networks (East Midlands, West Midlands, Eastern, North Thames, Yorkshire and Humber, North East and North Cumbria, North West Coast, Kent, Surrey and Sussex, and South West Peninsula), which assisted with recruitment and other study activities. We thank Public Health Scotland and NHS Digital for providing data linkage. We thank all the participants, physicians, nurses, and other staff who participated in the ALL-HEART study. Publisher Copyright: © 2022 The Author(s). Published by Elsevier Ltd. This is an Open Access article under the CC BY 4.0 licensePeer reviewedPublisher PD

    Project Masihambisane: a cluster randomised controlled trial with peer mentors to improve outcomes for pregnant mothers living with HIV

    Get PDF
    Abstract Background Pregnant women living with HIV (WLH) face daily challenges maintaining their own and their babies' health and mental health. Standard Prevention of Maternal to Child Transmission (PMTCT) programs are not designed to address these challenges. Methods/Design As part of a cluster randomized controlled trial, WLH are invited to attend four antenatal and four postnatal small group sessions led by a peer WLH (a Peer Mentor). The WLH and their babies are assessed during pregnancy and at one week, six months, and twelve months post-birth. Mobile phones are used to collect routine information, complete questionnaires and remain in contact with participants over time. Pregnant WLH (N = 1200) are randomly assigned by clinic (N = 8 clinics) to an intervention program, called Masihambisane (n = 4 clinics, n = 600 WLH) or a standard care PMTCT control condition (n = 4 clinics; n = 600 WLH). Discussion Data collection with cellular phones are innovative and effective in low-resource settings. Standard PMTCT programs are not designed to address the daily challenges faced by WLH; Peer Mentors may be useful in supporting WLH to cope with these challenges. Trial registration ClinicalTrials.gov registration # NCT0097269

    MRI-derived g-ratio and lesion severity in newly diagnosed multiple sclerosis

    Get PDF
    Myelin loss is associated with axonal damage in established multiple sclerosis. This relationship is challenging to study in vivo in early disease. Here, we ask whether myelin loss is associated with axonal damage at diagnosis, by combining non-invasive neuroimaging and blood biomarkers. We performed quantitative microstructural MRI and single molecule ELISA plasma neurofilament measurement in 73 patients with newly diagnosed, immunotherapy naïve relapsing-remitting multiple sclerosis. Myelin integrity was evaluated using aggregate g-ratios, derived from magnetization transfer saturation (MTsat) and neurite orientation dispersion and density imaging (NODDI) diffusion data. We found significantly higher g-ratios within cerebral white matter lesions (suggesting myelin loss) compared with normal-appearing white matter (0.61 vs 0.57, difference 0.036, 95% CI 0.029 to 0.043, p < 0.001). Lesion volume (Spearman’s rho rs= 0.38, p < 0.001) and g-ratio (rs= 0.24 p < 0.05) correlated independently with plasma neurofilament. In patients with substantial lesion load (n = 38), those with higher g-ratio (defined as greater than median) were more likely to have abnormally elevated plasma neurofilament than those with normal g-ratio (defined as less than median) (11/23 [48%] versus 2/15 [13%] p < 0.05). These data suggest that, even at multiple sclerosis diagnosis, reduced myelin integrity is associated with axonal damage. MRI-derived g-ratio may provide useful additional information regarding lesion severity, and help to identify individuals with a high degree of axonal damage at disease onset. York, Martin et al. simultaneously measured g-ratio and plasma neurofilament in 73 relapsing-remitting multiple sclerosis patients at diagnosis using advanced MRI and single molecule ELISA. They demonstrate that g-ratio of cerebral white matter lesions varies at diagnosis, and show that high g-ratio of lesions is associated with elevated plasma neurofilament

    Allopurinol versus usual care in UK patients with ischaemic heart disease (ALL-HEART): a multicentre, prospective, randomised, open-label, blinded-endpoint trial

    Get PDF
    BACKGROUND: Allopurinol is a urate-lowering therapy used to treat patients with gout. Previous studies have shown that allopurinol has positive effects on several cardiovascular parameters. The ALL-HEART study aimed to determine whether allopurinol therapy improves major cardiovascular outcomes in patients with ischaemic heart disease. METHODS: ALL-HEART was a multicentre, prospective, randomised, open-label, blinded-endpoint trial done in 18 regional centres in England and Scotland, with patients recruited from 424 primary care practices. Eligible patients were aged 60 years or older, with ischaemic heart disease but no history of gout. Participants were randomly assigned (1:1), using a central web-based randomisation system accessed via a web-based application or an interactive voice response system, to receive oral allopurinol up-titrated to a dose of 600 mg daily (300 mg daily in participants with moderate renal impairment at baseline) or to continue usual care. The primary outcome was the composite cardiovascular endpoint of non-fatal myocardial infarction, non-fatal stroke, or cardiovascular death. The hazard ratio (allopurinol vs usual care) in a Cox proportional hazards model was assessed for superiority in a modified intention-to-treat analysis (excluding randomly assigned patients later found to have met one of the exclusion criteria). The safety analysis population included all patients in the modified intention-to-treat usual care group and those who took at least one dose of randomised medication in the allopurinol group. This study is registered with the EU Clinical Trials Register, EudraCT 2013-003559-39, and ISRCTN, ISRCTN32017426. FINDINGS: Between Feb 7, 2014, and Oct 2, 2017, 5937 participants were enrolled and then randomly assigned to receive allopurinol or usual care. After exclusion of 216 patients after randomisation, 5721 participants (mean age 72·0 years [SD 6·8], 4321 [75·5%] males, and 5676 [99·2%] white) were included in the modified intention-to-treat population, with 2853 in the allopurinol group and 2868 in the usual care group. Mean follow-up time in the study was 4·8 years (1·5). There was no evidence of a difference between the randomised treatment groups in the rates of the primary endpoint. 314 (11·0%) participants in the allopurinol group (2·47 events per 100 patient-years) and 325 (11·3%) in the usual care group (2·37 events per 100 patient-years) had a primary endpoint (hazard ratio [HR] 1·04 [95% CI 0·89–1·21], p=0·65). 288 (10·1%) participants in the allopurinol group and 303 (10·6%) participants in the usual care group died from any cause (HR 1·02 [95% CI 0·87–1·20], p=0·77). INTERPRETATION: In this large, randomised clinical trial in patients aged 60 years or older with ischaemic heart disease but no history of gout, there was no difference in the primary outcome of non-fatal myocardial infarction, non-fatal stroke, or cardiovascular death between participants randomised to allopurinol therapy and those randomised to usual care. FUNDING: UK National Institute for Health and Care Research
    corecore